A Spoken Document Retrieval System for TV Broadcast News in Spanish and Basque
نویسندگان
چکیده
This paper presents a spoken document retrieval system (Hearch) looking like a conventional search tool, which retrieves audio/video segments based on the automatic transcription of speech contents. The system consists of a backend that captures, processes and indexes audio/video resources, and a front-end that allows to search contents, configure various modules and display performance statistics through a web interface. An early version of this tool is available (http://gtts.ehu.es/Hearch/ ), which searches and retrieves segments on TV broadcast news repositories in Spanish and Basque. To evaluate the performance of the system, six manually transcribed TV broadcast news in Spanish and seven in Basque have been used. An approach based on extending the query with the so called friendly terms has been proposed and evaluated, attempting to minimize the effect of errors introduced by the Automatic Speech Recognition module. This approach led to slight performance improvements.
منابع مشابه
The need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملSearch and access to information contained in the speech of multimedia resources
The main goal of this project is to make scientific contributions and technological improvements related to the spoken document retrieval system (Hearch) developed by the Working Group on Software Technologies of the University of the Basque Country. Hearch looks like a conventional search tool (such as Google, Bing, etc) but it is designed to retrieve audio/video segments based on the automati...
متن کاملRecognition, indexing and retrieval of british broadcast news with the THISL system
This paper described the THISL spoken document retrieval system for British and North American Broadcast News. The system is based on the ABBOT large vocabulary speech recognizer and a probabilistic text retrieval system. We discuss the development of a realtime British English Broadcast News system, and its integration into a spoken document retrieval system. Detailed evaluation is performed u...
متن کاملDevelopment of Resources for a Bilingual Automatic Index System of Broadcast News in Basque and Spanish
The development of an automatic index system of broadcast news requires appropriate Video and Language Resources (LR) to design all the components of the system. Nowadays, large and well-defined resources can be found in most widely used languages (Informedia), but there is a lot of work to do with respect to minority languages. The main goal of this work is the design of resources in Basque an...
متن کاملThe Cambridge University spoken document retrieval system
This paper describes the spoken document retrieval system that we have been developing and assesses its performance using automatic transcriptions of about 50 hours of broadcast news data. The recognition engine is based on the HTK broadcast news transcription system and the retrieval engine is based on the techniques developed at City University. The retrieval performance over a wide range of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 47 شماره
صفحات -
تاریخ انتشار 2011